Goto

Collaborating Authors

 vector model


You shall know a piece by the company it keeps. Chess plays as a data for word2vec models

Orekhov, Boris

arXiv.org Artificial Intelligence

In this paper, I apply linguistic methods of analysis to non-linguistic data, metaphorically equating one with the other and seeking analogies. The productivity of this approach has been proven within the field of Super Linguistics. I argue that developed by computational linguists word embeddings (made with the algorithm word2vec) can shed light on the features of chess moves. Recently, computational linguistics has made a great progress in natural language processing (NLP). Within this field, tools have been developed for machine analysis of morphology, syntax and semantics. Computational linguistics became a stand-alone research area with its conferences (weblink) and actual fields. The experts have developed general principles of text analysis, optimal methods of word counting.


Adobe's next-gen Firefly 2 offers vector graphics, more control and photorealistic renders

Engadget

Just seven months after its beta debut, Adobe's Firefly generative AI is set to receive a trio of new models as well as more than 100 new features and capabilities, company executives announced at the Adobe Max 2023 event on Tuesday. The Firefly Image 2 model promises higher fidelity generated images and more granular controls for users and the Vector model will allow graphic designers to rapidly generate vector images, a first for the industry. The Design model for generating print and online advertising layouts offers another first: text-to-template generation. Adobe is no stranger to using machine learning in its products. The company released its earliest commercial AI, Sensei, in 2016.


Deep learning model for Mongolian Citizens Feedback Analysis using Word Vector Embeddings

Dashdorj, Zolzaya, Munkhbayar, Tsetsentsengel, Grigorev, Stanislav

arXiv.org Artificial Intelligence

A large amount of feedback was collected over the years. Many feedback analysis models have been developed focusing on the English language. Recognizing the concept of feedback is challenging and crucial in languages which do not have applicable corpus and tools employed in Natural Language Processing (i.e., vocabulary corpus, sentence structure rules, etc). However, in this paper, we study a feedback classification in Mongolian language using two different word embeddings for deep learning. We compare the results of proposed approaches. We use feedback data in Cyrillic collected from 2012-2018. The result indicates that word embeddings using their own dataset improve the deep learning based proposed model with the best accuracy of 80.1% and 82.7% for two classification tasks.


GitHub - explosion/sense2vec: 🦆 Contextually-keyed word vectors

#artificialintelligence

This library is a simple Python implementation for loading, querying and training sense2vec models. To explore the semantic similarities across all Reddit comments of 2015 and 2019, see the interactive demo. Note that this example describes usage with spaCy v3. To try out our pretrained vectors trained on Reddit comments, check out the interactive sense2vec demo. This repo also includes a Streamlit demo script for exploring vectors and the most similar phrases.


Functional Classification of Bitcoin Addresses

Febrero-Bande, Manuel, González-Manteiga, Wenceslao, Prallon, Brenda, Saporito, Yuri F.

arXiv.org Machine Learning

This paper proposes a classification model for predicting the main activity of bitcoin addresses based on their balances. Since the balances are functions of time, we apply methods from functional data analysis; more specifically, the features of the proposed classification model are the functional principal components of the data. Classifying bitcoin addresses is a relevant problem for two main reasons: to understand the composition of the bitcoin market, and to identify addresses used for illicit activities. Although other bitcoin classifiers have been proposed, they focus primarily on network analysis rather than curve behavior. Our approach, on the other hand, does not require any network information for prediction. Furthermore, functional features have the advantage of being straightforward to build, unlike expert-built features. Results show improvement when combining functional features with scalar features, and similar accuracy for the models using those features separately, which points to the functional model being a good alternative when domain-specific knowledge is not available.


GitHub - plasticityai/magnitude: A fast, efficient universal vector embedding utility package.

#artificialintelligence

A feature-packed Python package and vector storage file format for utilizing vector embeddings in machine learning models in a fast, efficient, and simple manner developed by Plasticity. It is primarily intended to be a simpler / faster alternative to Gensim, but can be used as a generic key-vector store for domains outside NLP. It offers unique features like out-of-vocabulary lookups and streaming of large models over HTTP. Published in our paper at EMNLP 2018 and available on arXiv. Google Colaboratory has some dependency issues with installing Magnitude due to conflicting dependencies.


Learning Regular Expressions for Interpretable Medical Text Classification Using a Pool-based Simulated Annealing and Word-vector Models

Tu, Chaofan, Bai, Ruibin, Lu, Zheng, Aickelin, Uwe, Ge, Peiming, Zhao, Jianshuang

arXiv.org Artificial Intelligence

In this paper, we propose a rule-based engine composed of high quality and interpretable regular expressions for medical text classification. The regular expressions are auto generated by a constructive heuristic method and optimized using a Pool-based Simulated Annealing (PSA) approach. Although existing Deep Neural Network (DNN) methods present high quality performance in most Natural Language Processing (NLP) applications, the solutions are regarded as uninterpretable black boxes to humans. Therefore, rule-based methods are often introduced when interpretable solutions are needed, especially in the medical field. However, the construction of regular expressions can be extremely labor-intensive for large data sets. This research aims to reduce the manual efforts while maintaining high-quality solutions


Predicting molecular dipole moments by combining atomic partial charges and atomic dipoles

Veit, Max, Wilkins, David M., Yang, Yang, DiStasio, Robert A. Jr., Ceriotti, Michele

arXiv.org Machine Learning

The molecular dipole moment ($\boldsymbol{\mu}$) is a central quantity in chemistry. It is essential in predicting infrared and sum-frequency generation spectra, as well as induction and long-range electrostatic interactions. Furthermore, it can be extracted directly from high-level quantum mechanical calculations, making it an ideal target for machine learning (ML). In this work, we choose to represent this quantity with a physically inspired ML model that captures two distinct physical effects: local atomic polarization is captured within the symmetry-adapted Gaussian process regression (SA-GPR) framework, which assigns a (vector) dipole moment to each atom, while movement of charge across the entire molecule is captured by assigning a partial (scalar) charge to each atom. The resulting "MuML" models are fitted together to reproduce molecular $\boldsymbol{\mu}$ computed using high-level coupled-cluster theory (CCSD) and density functional theory (DFT) on the QM7b dataset. The combined model shows excellent transferability when applied to a showcase dataset of larger and more complex molecules, approaching the accuracy of DFT at a small fraction of the computational cost. We also demonstrate that the uncertainty in the predictions can be estimated reliably using a calibrated committee model. The ultimate performance of the models depends, however, on the details of the system at hand, with the scalar model being clearly superior when describing large molecules whose dipole is almost entirely generated by charge separation. These observations point to the importance of simultaneously accounting for the local and non-local effects that contribute to $\boldsymbol{\mu}$; further, they define a challenging task to benchmark future models, particularly those aimed at the description of condensed phases.


PAC-Bayes Analysis of Sentence Representation

Nozawa, Kento, Sato, Issei

arXiv.org Machine Learning

Learning sentence vectors from an unlabeled corpus has attracted attention because such vectors can represent sentences in a lower dimensional and continuous space. Simple heuristics using pre-trained word vectors are widely applied to machine learning tasks. However, they are not well understood from a theoretical perspective. We analyze learning sentence vectors from a transfer learning perspective by using a PAC-Bayes bound that enables us to understand existing heuristics. We show that simple heuristics such as averaging and inverse document frequency weighted averaging are derived by our formulation. Moreover, we propose novel sentence vector learning algorithms on the basis of our PAC-Bayes analysis.


Analyzing Hypersensitive AI: Instability in Corporate-Scale Machine Learning

Regneri, Michaela, Hoffmann, Malte, Kost, Jurij, Pietsch, Niklas, Schulz, Timo, Stamm, Sabine

arXiv.org Artificial Intelligence

Predictive geometric models deliver excellent results for many Machine Learning use cases. Despite their undoubted performance, neural predictive algorithms can show unexpected degrees of instability and variance, particularly when applied to large datasets. We present an approach to measure changes in geometric models with respect to both output consistency and topological stability. Considering the example of a recommender system using word2vec, we analyze the influence of single data points, approximation methods and parameter settings. Our findings can help to stabilize models where needed and to detect differences in informational value of data points on a large scale.